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Data processing apparatus with parallel operating functional units 

EPO - DG 1 

1 1. 10. 2002 




The invention relates to a data processing apparatus, such as a VLIW (V ery 
Long Instruction Word) processor, that is capable of executing a plurality of instructions 
from an instruction word in parallel. 

A VLIW processor makes it possible to execute programs with a high degree * 
of instruction parallelism. Conventionally, in each instruction cycle the VLIW processor uses 
a program counter to fetch an instruction word that contains a fixed number, greater than one, 
of instructions (often called operations). The VLIW processor executes these operations in 
parallel in the same instruction cycle (or cycles). 

For this purpose the VLIW processor contains a plurality of functional units, 
each capable of executing one of the operations from the instruction word at a time. Different 
kinds of functional units are typically provided, such as ALU's (arithmetic logics units), 
multipliers, branch control units, memory access units etc. Often dedicated purpose 
functional units are also included, designed to speed up programs for a particular 
applications. Thus, for example, functional units for performing parts of MPEG encoding or 
decoding may be added. In advanced VLIW processors hundreds of functional units may be 
present. In principle, the instruction word must contain instructions for all of these functional 
units in parallel. 

As the abbreviation VLIW indicates this leads to very wide instruction words. 
As a result a considerable amount of instruction memory is needed to store programs of such 
instructions, especially when the programs executed by the VLIW processor contain many 
instructions. This leads to increased costs. 

Several measures have been proposed to reduce these costs. For example, the 
frac tional units have been organized into groups of one or more functional unit, so that the 
instruction word provides one instruction per group. This limits the amount of instructions 
that has to be included in the instruction word (when at least some of the groups contain more 
than one functional unit) and thereby it reduces instruction memory size, without reducing 
the number of functional units that can receive instructions. However, such a grouping 
reduces the number of functional units that can execute instructions in parallel. Thus, there is 
always a direct trade-off between the amount of parallelism and memory size. 
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More generally, the inflexibility with which instructions have to be combined 
into instruction words in VLIW instructions also increases the size of the memory used by 
programs. For example, when a program must support alternative execution of different - 
combinations of instructions in an instruction cycle, instruction words for all possible 
5 combinations have to be stored. When one part of the functional units must execute the same 
loop of instructions repeatedly a number of times while other functional execute 
progressively different instructions, conventional VLIW processors only have the option of 
choosing between unrolling the loop, i.e. enlarging the program by combining repeated 
copies of the instructions 9 of the loop with progressively different instructions; or executing 
10 the loop and the progressive instructions in different instruction cycles. The former greatly 
increases the required instruction memory size and is not possible if the number of iterations 
in the loop is not known in advance. The latter reduces the amount of parallelism and thereby 
increases execution time. 

Among others, it is an object of the invention to increase the flexibility with 
1 5 which instructions can be combined in instruction words of VLIW processors. 

Among others, it is another object of the invention to reduce the amount of 
instruction memory needed in VLIW processors. 

The invention provides for a data processing apparatus according to Claim 1. 
This data processing apparatus is of a type, such as a VLIW processor, that uses a central 
20 control of program flow with an instruction address that addresses an instruction word that 
contains a plurality of instructions for different functional units. However, during program 
execution the translation of the instruction address into physical addresses can be modified 
selectively for a particular one of the functional units or a group of the functional units 
individually. 

25 This can be used to change the way instruction words are composed from 

instructions from different memory units during program execution. Thus, when part of a 
first VLIW instruction word in a program is a copy of part of a second VLIW instruction 
word in that program, it is not necessary to provide complete storage locations for both 
words. Instead, a modification of the address translation between supplying the instruction 

30 address for the first instruction word and supplying the instruction address for the second 

instruction word makes it possible to re-use the stored part of the first instruction word in the 
second instruction word. Thus, less memory is needed to store a program. This may be 
applied to implement loops wherein part of the functional units have to execute a loop body 
repeatedly while other functional units execute progressive instructions, or to implement "If- 
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then-else" constructs, wherein part of the functional units conditionally execute alternative 
while other functional units execute the same instructions irrespective of the condition. 

Preferably, the modification of address translation is controlled by 
modification update instructions that are part of the instructions of the program. The 
modification update instructions may be contained in the instructions from the particular one 
of the memory units, or in instructions from one or more memory units other than the 
particular one of the memory units. Thus, a flexible control over looping and conditional 
execution can be realized. However, without deviating from the invention control of the 
mbdification of translation rftay be realized outside thtf program, for example usihg one or 
more memory management units that provide for translation of instruction addresses that may 
differ for different memory units, so that the same instruction address can be translated 
differently for different memory units, with the possibility of translating different instruction 
addresses each to the same physical address for one memory unit that supplies one part of the 
instruction word and to different instruction addresses for other memory units that supply 
other parts of the instruction word. This is particularly useful for implementing loops of 
instructions that have to be supplied repeatedly a predetermined number of times by only part 
of the memory units. 

In a further embodiment, at least the particular one of the memory units has 
memory locations only for an address range that is smaller than a range of addresses for 
which another one of the memory units has locations available. When the instruction address 
leads to addresses outside this range, the functional unit(s) that gets its instructions from the 
particular one of the memory units may be deactivated, for example by supplying default No- 
op instructions, or disabling the functional unit(s). When deactivated the functional unit is 
preferably switched to a power saving state, for example by disabling clock signals in the 
functional unit. 

The data processing apparatus can be used to execute a program that involves 
executing a loop for some of the functional units during instruction cycles when other 
functional units execute progressive instructions. Each time when a loop back occurs, the 
translation of the instruction address for the functional units that are involved in the loop is 
modified so that the instructions from the loop are fetched repeatedly from the same 
locations. 

The translation of the common instruction address into memory addresses may 
be changed for more than one functional unit or group of functional units. Thus multiple 
loops with mutually different numbers of instruction may be executed in parallel. 
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These and other advantageous aspects of the data processing apparatus and 
method according to the invention will be described in more detail using the following 
5 figures. 

Figure 1 shows a data processing apparatus 
Figure 2 illustrates execution of a loop of instructions 
Figure 3 shows a data processing apparatus n 
Figure 4 illustrates an example of program flow 
10 Figure 5 shows a data processing apparatus 

Figure 1 shows a processing apparatus that contains a memory system 10, with 
memory units 12a-c, a controller 14, and an instruction execution unit 7 that contains groups 

15 70a-c of functional units 18a-e, a register file 72, an instruction address counter unit 74 an 
offset adder 15a and an offset register 15b. Instruction address counter unit 74 has an 
instruction address output coupled to controller 14. Controller 14 has address outputs coupled 
to respective memory units 12a-c. One of the address outputs is coupled to memory unit 12c 
via offset adder 15a. Memory units 12a-c have instruction outputs coupled to respective ones 

20 of groups 70a-c and to register file 72. Register file has operand/result output/input ports (not 
shown separately) coupled to groups 70a-c. Groups 70a-c each contain one or more 
functional unit 18a-e, which all have operation code inputs coupled to memory units 12a-c, 
operand inputs coupled to register file 72 and result outputs coupled to register file 72 (all 
being symbolized by a single connection between memory units 12a-c, groups 70a-c of 

25 functional units 18a-e and the register file 72.). One of groups 70b has a branch address 

output coupled to instruction address counter unit 74. Another one of the groups 70c has an 
offset adjust output coupled to offset register 15b, which in turn has an offset output coupled 
to offset adder 15a. 

In operation the processing apparatus operates in successive instruction cycles, 
30 in which address counter unit 74 outputs addresses of successive instructions to controller 14 
(these instructions will be called "successive" because the corresponding instructions are 
executed successively, although in the case of branches the addresses may not be successive). 
Controller outputs further instruction addresses derived from the instruction address to 
memory units 12a-c. One of the further instruction addresses is modified by offset adder 15a, 
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which adds an offset from offset register 15b to the further instruction address. The 
(modified) further instruction addresses address instruction memory locations in memory 
units 12a-c. Memory units 12a-c output addressed instructions to instruction execution unit 7. 

The combination of instructions output from memory units 12a-c forms an 
instruction word with fields for the various instructions. Each group 70a-c of functional units 
18a-e receives an instruction from a respective one of memory units 12a-c. The functional 
units 18a-e of the group 70a-c determine which of the functional units 18a-e of the group 
70a-c should execute the instruction from the corresponding memory unit 12a-c, and that 
functional unit reads operands addressed by ,the instruction from register file 72 (if any) and 
supplies results to register file 72 (if any). 

As shown, one of the groups 70a-c has a connection from a branch functional 
unit 18d to update the instruction address in instruction address counter unit 74 in response to 
an instruction. Branch functional unit 18d executes this update for example when it 
determines that some condition has been met. Updates may be absolute (replacement of 
program counter value in address counter unit 74) or relative (addition to the program counter 
value). A single connection is shown by way of example. In practice more than one group 
70a-c may contain one or more branch functional units coupled to instruction address counter 
unit 74. 

Updates of the program counter (instruction address) in address counter unit 
74 by branch functional unit 18d affect program (instruction address) flow for all groups of 
functional units 70a-c. Offset adder 1 5a and offset register 1 5b provide for a way of affecting 
program (instruction address) flow for one of the groups of functional units 70c individually. 
For this purpose, a group 70c contains a local branch functional unit 18e, which processes 
local branch instructions basically in the same way as a conventional branch functional unit 
18d, except that local branch functional unit 18e does not update the program counter value 
in the overall address counter unit. Instead local branch functional unit 18e updates an offset 
value in offset register 15b, which is used to modify the address supplied to the memory unit 
12a-c of one of the groups of functional units 70a-c. Thus, the offset can be altered during 
program execution dependent on conditions that occur during execution. 

Figure 2 illustrates how this may be used for executing a loop of instructions 
repeatedly with one group of functional units 70c, while the other groups of functional units 
70a,b execute progressive instructions. In this figure instruction address values A are plotted 
vertically and instruction cycle number, representative of time is plotted horizontally. A first 
line 20 illustrates the progressive instruction address from program counter unit 74 that are 
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successively supplied to part of memory units 12a,b a second line illustrate looping addresses 
22 that are successively supplied to one of the memory units 12c in parallel with the 
progressive addresses. 

In this case, one memory unit 12c contains the instructions from the loop and 
5 other memory units 12a-b contain the progressive instructions. Initially, the offset value in 
offset register 15b is for example zero and all memory units 12a-c receive the same address. 
At the end of the loop one memory unit 12c outputs a branch instruction for local branch 
functional unit 18e, which in response subtracts an offset value from the offset in offset 
register 15b. The offset equals the offset between the start of the loop and the branch" 
10 instruction. As a result, although the program counter value in program counter unit 74 
continues to increase, one of the memory units 12c starts to repeat fetching of instructions 
from the start of the loop. Once the instructions from the loop have been executed a sufficient 
number of times local branch functional unit 18e does not cause the offset value to be 
subtracted. Instead, implicitly with the absence of branch back, or in response to a subsequent 
1 5 instruction local branch functional unit 1 8e may reset the offset in offset register 1 5b to zero 
or to some other appropriate value. 

The invention is not limited to loops, however. For example, "if-then-else" 
constructs for part of the functional units may be supported, by updating the offset value in 
offset register 15b dependent on whether the "then"-clause or the "else"-clause must be 
20 executed. Similar techniques may be applied to "switch by case" constructs. 

As shown in figure 2, the instruction addresses from program counter unit 74 
progress uniformly. Of course, branches may cause deviations from this uniform progression. 
Preferably, therefore, a compiler that generates the instructions words for execution by the 
processing apparatus prevents overall branch instructions (in particular conditional branch 
25 instructions) during parts of the program where it is required that one group of functional 
units executes instructions from addresses with a selectable offset from the overall program 
counter. In an embodiment the compiler that generates the instructions for execution by the 
processing apparatus adds offset changing instructions only after checking that no such 
program counter branches occur in the part of the program where the offset is applied, or 
30 conversely, avoids program counter branch instructions in parts of the program where an 

offset is applied (This may be implemented for example using so-called "if-conversion", that 
is, by implementing an "if then II else 12" construct by including and executing in the 
program both the instructions II that have to be executed in the "then case" and those 
instructions 12 that have to be executed in the "else case" (if any) and making completion of 
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execution of each of these instructions II, 12 conditional on some guard bit value that has 
been computed from the "if condition). 

However, the compiler may also insert corresponding local branches to update 
the offset so that it undoes the effect of branches of the overall program counter value. Thus, 
5 a branching or looping behavior of the instruction addresses from program counter unit 74 
can be combined with a steady progression of local addresses applied to one of memory units 
12a-c. 

Although it has been assumed in this explanation that controller 14 supplies 
the, same addresses to all memory units 12a-c, this is not in fact necessary. Withqut deviating 
10 from the invention controller 14 may apply different forms of mapping. Similarly, although 
local branch functional unit 18e has been shown in the group of functional units 90a-c whose 
instruction address is modified, it should be understood that without deviating from the 
invention local branch functional unit 18e maybe located in any group of functional units 
70c. Furthermore, although only one local branch functional unit 18e has been shown for one 
15 memory unit 12c it should be understood that more than one local branch unit may be 

provided for modifying the offset for the same memory unit 12c, e.g. for modifying the offset 
in offset register 15b or in a plurality of offset registers. 

Figure 3 shows a data processing apparatus in which circuits 300 for the 
modification of the addresses are provided for more than one of the memory units 12a-c. 
20 Without deviating from the invention, such circuits may also be provided only for a subset of 
one ore more memory units 12a-c. As shown, the offset provided by each of these circuits is 
controlled by a respective local branch functional unit 1 8e. However, without deviating from 
the invention or the local branch unit 18e may be constructed to execute instructions that 
select for which memory unit 12a-c the offset should be modified. Also a shared address 
25 modification circuit may be used for a plurality of the memory units 12a-c. 

Figure 4 illustrates an example of program flow where one group of functional 
units executes progressive instructions and other groups of functional units execute 
repetitions of respective loop bodies, preceded and followed by execution of progressive 
instructions. During execution a program counter value PC steadily increases from top to 
30 bottom in the figure. Instruction execution for respective ones of the groups of functional 
units is indicated in respective columns 400a-c. First column 400a shows a first and second 
blocks 402, 404 of successively executed instructions. Second and third column 400b,c show 
repeated execution of blocks 406, 408. Because of data dependencies repeated execution of 
blocks 406, 408 can start only after execution of a certain instruction in first block 402, 
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whereas execution of second block 404 can start only after a certain instruction in repeatedly 
executed block 408 has been executed for the last time. Between execution of the last 
instruction of first block 402 and the start of execution of the first instruction second block 
404 first column contains an intermediate block 403 wherein only No-Ops may be executed. 
5 According to the invention, this type of execution is realized by updating the 

offset for the functional units that execute repeatedly executed blocks 406, 408 each time 
when starting execution of a new repetition of the relevant block. Thus, the instruction 
address locations that contain the instructions from blocks 406, 408 are repeatedly addressed, 
although overall the program counter PC value steadily increases. 
1 0 During execution of intermediate block 403 the group of functional units that 

execute instructions from first column 400a may receive no-ops from successively higher 
locations in its corresponding instruction memory unit. Thus, a number of No-ops should be 
provided in that instruction memory unit that corresponds repeated execution of instructions 
from blocks 406, 408 in the other functional units. Alternatively, the offeet for the instruction 
1 5 memory with instructions of first column 400a may also be periodically updated so as to 
provide for repeated "execution" of a body of No-ops, or loop back instructions in 
intermediate block 403. Thus, memory space is saved. Preferably this body should be as short 
as possible and it should be repeated as many times as possible in intermediate block 403 to 
fill this block as much as possible with repeated executions. 
20 When the apparatus is arranged to permit updates of the offset registers of 

groups of functional units that execute instructions from the second or third column 400b,c 
by instructions from the group of functional units that executes instructions from first column 
400a, the first column 400a (including intermediate block 403) may include instructions that 
cause repeated execution of block 406 and/or block 408. Otherwise, these instructions may 
25 be included in those blocks. 

When the invention is used it is not necessary that the address ranges of 
memory units 12a-c are co-extensive: memory units 12a-c may contain mutually different 
numbers of addressable locations. Thus, for example, a first one of the memory units 12a-c 
that contains parts of instruction words that are intended for a group of functional units 70a 
30 that performs a set of general purpose instructions, including branches and ALU (Arithmetic 
Logic Unit) instructions, may provide for a greater number of instruction addresses than 
memory units 12a-c that apply instructions to other, more specialized, groups of functional 
units. In this case, the general purpose functional units may generally be active to execute 
instructions during execution of a program, whereas the more specialized functional units 
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addresses may be active only intermittently during execution of a program, or they may 
repeatedly execute a loop of instructions while the general purpose functional units execute 
progressively different instructions. 

When a group of functional units 70a-c does not need to execute instructions, 
the offset used to execute the memory address for the memory unit 12a-c of that group may 
be repeatedly updated so as to limit the range of addresses applied to that memory unit, so 
that the addresses stay in range for that memory unit 12a-c. However, preferably, the 
relevant memory unit 12a-c is at least partly disabled when its group of functional units 70a-c 
does not need to execute instructions during a part of a program. Thus, power dissipation can 
be reduced. In this case, the instruction addresses may run on, through a range of address 
values for which the relevant memory unit 12a-c does not store instructions. 

The invention is not limited to repeated execution of loops of instructions. For 
example, programs that contain if-then-else clauses which affect only part of the groups of 
functional units 70a-c (the other groups executing the same instructions both in case of the 
"then" clause and in case of the "else" clause) may also use the invention. In this case the 
offset the memory unit 12a-c of a selected group 70a-c may be updated dependent on the "if 1 
condition, at a time when the instruction address for the other groups of functional units runs 
on. Thus, no extra instructions have to be provided for these other functional units. 

Although all groups of functional units 70a-c have been shown without 
distinctions, it will be understood that the groups may in fact differ: functional units in some 
groups may receive literal data, such as branch addresses or constants from memory units 
12a-c, whereas others merely receive operation codes, data being supplied from register file 
72, some groups may receive larger numbers of operands than others, or produce larger 
numbers of results. 

Furthermore, although separate memory units 12a-c have been shown for 
respective groups of functional units 70a-c, it will be understood that some groups may share 
a memory unit 12a-c, so that the memory unit produces instructions for these groups in 
parallel (in general these memory units will have wider instruction output than other ones of 
memory units 12a-c). 

Also, although an offset register has been shown as a simple way to implement 
address modification, it will be understood that, without deviating from the invention, other 
ways may be used to translate addresses for part of the memory units 12a-c that supply 
instructions from the instruction word. 
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Figure 5, for example, shows one or more memory management units 500 that 
are conventional per se, and which are used between the instruction address counter unit and 
some of memory units 12a-c, to provide for instruction address dependent translation 
(without deviating from the invention memory management units may be used for only one 
5 memory unit 12a-c or for all memory units 12a-c). In this case, the translation for different 
memory units may differ, so that a mutually different first and second instruction addresses 
are translated to the same physical address for part of the memory units, but to mutually 
different physical addresses for other memory units. In this way the examples that have been 
given above can be implemented without the need for modification update instructions in the 

10 program, by altering the address translation dependent on the common instruction address 
counter value. This instruction address counter (also called program counter) value runs on 
through successive iterations of loops that affect part of the functional units. From the value 
of the instruction address it can be determined whether a particular group of functional units 
70a has to execute an instruction from a loop, and if needed even the iteration number can be 

1 5 determined. Accordingly, the memory management unit 500 repeatedly translates the 
instruction address for the relevant memory unit so as to repeatedly fetch the same 
instruction. It may be noted that in this case the modification of translation may even be 
detailed at the single instruction address level, so that a group 70a-c could execute instruction 
from the loop part of the time, while functional units from that same group 70a-c could 

20 execute progressive instructions the remainder of the time. 

However, a program controlled offset register is less complex to implement 
than a memory management unit and has the advantage of supporting simple data dependent 
control, but for implementing loops with predetermined numbers of repetitions such memory 
management units work as well. Of course, without deviating from the invention other 

25 implementations may be used, such as offset counters, which periodically update the offset 
autonomously. Also combinations of offset registers and memory management units nay be 
used etc. 

In practice the processing apparatus may use pipelining of instruction 
execution. That is, in the same instruction cycle controller 14 may process one instruction 
30 address, memory units 12a-c may retrieve instructions for a preceding instruction address and 
functional units 18a-e may process one or more processing stages for one or more yet further 
preceding instruction address. In this case, application of the offset from offset register 15b 
may also be pipelined. 
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1 A data processing apparatus, the apparatus comprising 

an instruction addressing unit; 

an instruction memory system arranged to output an instruction word, capable 
of containing a plurality of instructions, in response to an instruction address from the 
instruction addressing unit, the instruction memory system comprising a plurality of memory 
units, arranged to output respective parts of the instruction word in parallel; 

an instruction execution unit, comprising a plurality of functional units, each 
capable of executing a respective instruction from the instruction word in parallel with 
execution of other instructions from the instruction word by other ones of the functional 
units; 

an instruction address modification circuit arranged to modify translation of 
the instruction address into a physical address for a particular one of the memory units 
relative to other ones of the memory units during program execution. 

2. A data processing apparatus according to Claim 1 , wherein the instruction 
address modification unit is arranged to modify the translation under control of a 
modification update instruction from the instruction word during program execution. 

3. A data processing apparatus according to Claim 2, wherein the particular one 
20 of the memory units is arranged to supply instructions exclusively to a group that contains a 

subset of the functional units, the group containing a modification update functional unit 
constructed to execute the modification update instruction. 

4. A data processing apparatus according to Claim 2, wherein the particular one 
25 of the memory units is arranged to supply instructions exclusively to a group that contains a 

subset of the functional units, the functional units comprising a modification update 
functional unit outside the group constructed to execute the modification update instruction. 



10 



PHNL020979EPP 

^ 10.10.2002 

5. A data processing apparatus according to Claim 2 wherein the modification 
update instruction is a conditional instruction, the modification update being executed 
dependent on fulfilment of a condition specified in the modification update instruction. 

6. A data processing apparatus according to Claim 1 , wherein the instruction 
address modification circuit is arranged for instruction address and memory unit dependent 
address translation, so that a first and a second instruction address are translated to a same 
physical address for the particular one of the memory units and to mutually different physical 
addresses for one or more memory units other than the particular one of -the memory units. 



7 - A data processing apparatus according to Claim 1 , programmed to use 

repeated modification of said translation to repeatedly output one or more instructions 
making up a loop of instructions, while the instruction address progresses so that the 
instructions from the loop are combined in the instruction words with progressive instructions 
15 that are not repeated during at least part of the repetitions of supply of instructions from the 
loop. 

8. A data processing apparatus according to Claim 2, programmed to use said 

modification update instruction to selectively output, dependent on a data dependent 
20 condition, a first or a second block of one or more instructions from said particular one of the 
memory units, while the instruction address progresses so that at least part of the memory 
units output one or more instructions from a third block of instructions as part of the 
instruction word or words in combination with instructions from said first or second block. 

25 9 - A data processing apparatus according to Claim 1, wherein a first number of 

addressable instruction addresses of the particular one of the memory units differs from a 
second number of addressable instruction addresses of at least one of the memory units. 

10. A data processing apparatus according to Claim 9, wherein the particular one 

30 of the memory units is arranged to switch to a power saving state when the modified 

instruction address is outside an address range or set of address ranges that contains said first 
number of instruction addresses. 



PHNL020979EPP 

13 10.10.2002 
11. Amethod of executing a program of instruction words with a data processing 

apparatus that comprises a plurality of functional units capable of executing a plurality of 
instructions from each instruction word in parallel, wherein the instructions from each of at 
least some of the instruction words are fetched from respective memory units in parallel, the 
method comprising 

addressing the instruction word with an instruction address that is common for 
the functional units; 

using a modifiable translation of the instruction address into a physical address 
for a particular one of the memory units to select dependent on program execution which 
instructions from the memory units will be combined into the instruction word in response to 
the instruction address. 

12. A method of executing a program of instruction words according to Claim 11, 
wherein the modifiable translation is selected under control of a modification update 
instruction in the program. 

13. A method of executing a program of instruction words according to Claiml 1, 
comprising modifiable translation to repeatedly fetch instructions from a loop from repeated 
physical addresses in a particular one of the memory units in response to progressive 
instruction addresses, so that the instructions from at least part of repetitions of the loop are 
combined in the instruction words with progressively different instructions memory units 
other the than the particular one of the memory unit. 

14 A computer program product comprising instruction words, each for execution 

in one or more respective instruction cycles by a data processing apparatus according to 
Claim 2, the instruction words comprising at least one modification update instruction for 
causing the data processing apparatus to execute, upon addressing a first one of the 
instruction words, a combination of instructions from the first one of the instruction word 
with further instructions from outside the first instruction word, following execution of the 
) modification update word. 
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ABSTRACT: J \ 10. 2002 

A program of instruction words is executed with a VLIW data processing 
apparatus. The apparatus comprises a plurality of functional units capable of executing a 
plurality of instructions from each instruction word in parallel. The instructions from each of 
at least some of the instruction words are fetched from respective memory units in parallel, 
addressed with an instruction address that is common for the functional units. Translation of 
the instruction address into aphysical address canbe modified for one or more particular 
ones of the memory units. Modification is conlxolled by modification update instructions m 
the program. Thus, it can be selected dependent on program execution which instructions 
from the memory units will be combined into the instruction word in response to the 
instruction address. 
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